Pre-processing for video compression

ABSTRACT

A processing system filters blocks of a picture to minimize a size and error of the blocks prior to encoding. A pre-processing module of the processing system measures characteristics of a plurality of blocks and evaluates the effects of applying each of a plurality of filters to the blocks prior to encoding in order to predict an increase in compressibility of blocks having similar characteristics that are filtered with each filter before being encoded, with the least impact on quality. The pre-processing module trains models to predict a size and error of blocks filtered with each filter based on block characteristics. The pre-processing module uses the models to calculate a cost in terms of size and error of applying each filter to a given block having certain characteristics. The pre-processing module then applies to the block the filter that is predicted to result in the best cost.

BACKGROUND

A multimedia server generates data representative of pictures in amultimedia stream, e.g., a multimedia stream that has been requested bya user. An encoder of the multimedia server encodes the data for eachpicture to form a bitstream that is transmitted over a network to adecoder that decodes the bitstream and provides the decoded videoinformation to a multimedia application or any other application fordisplay to the user. Such multimedia encoders and decoders are used in awide variety of applications to facilitate the storage and transfer ofmultimedia streams in a compressed fashion.

To compress multimedia streams, conventional encoders implement videocompression algorithms in which the degree of compression depends inpart on a quality parameter such as a quantization parameter. Aquantization parameter is a number that can be used to derive a standardmatrix for quantizing transformed data in a codec. A higher quantizationparameter often results in lower bit usage for a picture, whereas alower quantization parameter often results in higher bit usage for thepicture. Compression algorithms use different quantization parametersthat affect the allocation of bits to titles, frames, slices, and blocksof pictures. However, using a quantization parameter that is too lowresults in the unnecessary consumption of computing resources andbandwidth in encoding, transmitting, and decoding of pictures, withoutany commensurate benefit. On the other hand, using a quantizationparameter that is too high results in unnecessarily (or unacceptably)reduced quality of encoded pictures. In addition, changing thequantization parameter of a picture can lead to an unpredictable loss inquality.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure may be better understood, and its numerousfeatures and advantages made apparent to those skilled in the art byreferencing the accompanying drawings. The use of the same referencesymbols in different drawings indicates similar or identical items.

FIG. 1 is a block diagram of a processing system that includes apre-processing module to select and apply a pre-processing filter to ablock of a picture to be encoded based on a cost analysis in accordancewith some embodiments.

FIG. 2 is an illustration of the effects of varying the spatial scalesof two sample blocks in accordance with some embodiments.

FIG. 3 is an illustration of the effects of varying the dynamic rangesof two sample blocks in accordance with some embodiments.

FIG. 4 is a block diagram of a filter control of the pre-processingmodule applying filters having different settings to a portion of apicture in accordance with some embodiments.

FIG. 5 is a flow diagram illustrating a method for training models topredict sizes and distortion of blocks filtered with different filtersin accordance with some embodiments.

FIG. 6 is a flow diagram illustrating a method for filtering blocks of apicture using a filter having a lowest cost for each block in accordancewith some embodiments.

DETAILED DESCRIPTION

FIGS. 1-6 illustrate processing systems and techniques forpre-processing a portion, or block, of a picture using a filter tominimize a size and error of the block prior to encoding of a multimediastream. A pre-processing module of the processing system measurescharacteristics of a plurality of blocks using a plurality of metricsand evaluates the effects of applying each of a plurality of filters (ora filter with a plurality of settings) to the blocks prior to encodingin order to predict an increase in compressibility of blocks havingsimilar characteristics that are filtered with each filter before beingencoded using a given quantization parameter, with the least impact onquality. The pre-processing module develops and trains models to predicta size and error of blocks filtered with each of the plurality offilters based on characteristics of the blocks. The pre-processingmodule uses the models to calculate a cost in terms of size and errorresulting from applying each filter to a given block of a picture basedon characteristics of the block. The pre-processing module then appliesto the block the filter that is predicted to result in the best cost(e.g. the smallest size (i.e., lowest bit usage) while having leastimpact on quality). By pre-processing the blocks of a picture byapplying a filter calculated to result in a low cost, the pre-processingmodule increases the compressibility of the picture when encoded using agiven quantization parameter with a predictable amount of quality loss.

Blocks having different characteristics will be affected differentlywhen filtered with a given filter. The pre-processing modulecharacterizes each block of a picture by measuring multiplecharacteristics, such as colorfulness, contrast, or noisiness, of theblock. In some embodiments, the pre-processing module characterizes theblock at a plurality of spatial compression scales and/or dynamic rangescales by calculating a gradient of the block at each spatial scaleand/or dynamic range to generate a “multi-scale metric” of the block.Each block includes a degree of fine or coarse detail, contrast,structures, color, brightness. Pixel activity such as variations ofpixel intensities within a portion may be used to detect edges,repeating patterns, and other structures or objects in an image. Pixelactivity can be measured using various metrics such as, for example,using a gray-level co-occurrence matrix, a two-dimensional spatial meangradient, wavelet or other transforms, discrete cosine function, oraverage value within a portion.

A single measurement of pixel activity such as 2D spatial gradient orwavelet transform can result in similar measurements for blocks that areactually dissimilar, such as block including a carpet versus a blockincluding a plain sheet of white paper. By analyzing the pixel activitylevel of each block at a plurality of spatial scales and/or dynamicranges, the pre-analysis module generates a robust characterization ofthe block that indirectly indicates how bit allocation or assignment ofa given QP will affect the perceptual quality of the block. In someembodiments, the multi-scale metric includes an indication of thepresence of skin tone within a block. Blocks that have similarmulti-scale metrics, as measured by, e.g., Cartesian difference betweenmulti-scale metrics, are likely to be visually similar and to besimilarly affected by application of a given filter having particularcontrol settings.

In some embodiments, rather than apply all N filters to each block andthen select the filter having the lowest cost, the pre-processing moduletrains a model using a large number of blocks from selected trainingvideos to predict a cost for each filter when applied to a particularblock. To train the model, the pre-processing module characterizes eachof the blocks from the training videos using the multi-scale metric oranother measure of characteristics of each block. The pre-processingmodule applies a first filter, having first control settings, to eachblock of the training videos and encodes the filtered blocks with aspecific quantization parameter using a targeted encoder. Thepre-processing module calculates the size (i.e., bit usage) and an errormetric to measure distortion introduced by filtering for each encodedfiltered block of the training videos. Based on the characteristics,calculated sizes, and calculated errors for each of the blocks, thepre-processing module develops a regressive model, such as leastsquares, random forest regressor, or other machine learning model, topredict a size of a given block having particular characteristics whenfiltered using the first filter and encoded using the specificquantization parameter. The pre-processing module similarly develops aregressive model to predict an error for a given block having particularcharacteristics when filtered using the first filter and encoded usingthe specific quantization parameter. The pre-processing module thenrepeats the process of applying a filter, encoding, calculating sizesand errors, and developing a regressive model for each of the remainingN−1 filters. In some embodiments, the pre-processing module also repeatsthe process of applying a filter, encoding, calculating sizes anderrors, and developing a regressive model for any additionalquantization parameters that may be used for encoding the blocks.

Once the pre-processing module has developed a model for each filterand/or quantization parameter of interest, the pre-processing moduleapplies the models and calculates a cost for each filter option toselect a lowest cost filter for pre-processing blocks of a picture ofinterest. For each block of a picture, the pre-processing modulecharacterizes the block and then calculates a predicted size for theblock when filtered with each of the N filters, based on the block'scharacteristics and the size model, and calculates a predicted error forthe block when filtered with each of the N filters, based on the block'scharacteristics and the error model. For each of the N filters, thepre-processing module calculates a cost that is a function of thepredicted size and error that would result from applying the filter tothe block, based on the block's characteristics. In some embodiments,the pre-processing module applies a filter to a block in response to thecost of the filter being below a threshold. In some embodiments, thepre-processing module compares the calculated costs for each of thefilters and selects the filter having the best (least) cost to apply tothe current block. The pre-processing module repeats the process ofapplying the models and calculating costs to select the lowest costfilter for each block of the picture. After pre-processing the pictureby applying the lowest cost filter to each block of the picture, thepre-processing module provides the filtered picture to the encoder forencoding. The encoder encodes each block of the picture using aspecified quantization parameter. The encoder transmits the encodedpicture over a network to a decoder that decodes the bitstream andprovides the decoded video information to a multimedia application fordisplay to the user. Pre-processing pictures with low-cost filtersreduces bit usage (i.e., increases compressibility) with little or novisible effect.

FIG. 1 illustrates a processing system 100 that includes apre-processing module 110 to select and apply a pre-processing filter120 to a block of a picture based on a cost analysis in accordance withsome embodiments. The pre-processing module 110 may be implemented ashard-coded logic, programmable logic, software executed by a processor,or a combination thereof. In some embodiments, the processing system 100is distributed across a variety of electronic devices, such as a server,personal computer, tablet, set top box, gaming system, mobile phone, andthe like.

The pre-processing module 110 includes a characterization module 115, aplurality of filters 120, a filter control 125, a predictor 130, a costcalculator 135, and a filter selector 140, each of which may beimplemented as hard-coded logic, programmable logic, software executedby a processor, or a combination thereof. The processing system 100receives digital information that represents a stream or sequence ofpictures in a multimedia stream. The term “multimedia” refers to astream of data including video data, audio data, and the like, or acombination thereof, and in some embodiments also include control data,metadata, and the like, or any combination thereof. The processingsystem 100 divides each picture into coding units such as macroblocks,coding tree blocks (CTBs), titles, and slices, referred to herein as“blocks”, which are provided to the pre-processing module 110.

The characterization module 115 analyzes each block and characterizesthe blocks according to metrics such as spatial gradient, colorfulness,contrast, or noisiness. In some embodiments, the characterization module115 characterizes the blocks using a multi-scale metric that indicatespixel activity of the block measured at a plurality of spatial scalesand dynamic ranges. In such embodiments, the characterization module 115employs a video/image scaler (not shown) that adjusts the spatial scaleof each block and a dynamic range modulator (not shown) that adjusts thedynamic range of each block while the characterization module 115calculates pixel activity for each block at each spatial scale anddynamic range setting. “Spatial scale” refers to the number of pixelsrepresented by the block, and a pixel is the smallest addressableelement in a display device. For example, in some embodiments, thevideo/image scaler rescales each block to a plurality of spatial scalesettings, such as 1:1, 2:1, 4:1, and 8:1, such that for a block of 16×16pixels (i.e., a macroblock), at a 1:1 spatial scale, the block isunaltered (16×16 pixels), at a 2:1 spatial scale, the original 16×16block is compressed to 8×8 pixels, at a 4:1 spatial scale, the block iscompressed to 4×4 pixels, and at an 8:1 spatial scale, the block iscompressed to 2×2 pixels. “Dynamic range” refers to the number of tonalvalues of a pixel. For example, in some embodiments, the dynamic rangemodulator applies a plurality of dynamic range settings, such as 1:1,2:1, 4:1, and 8:1, such that for a block having an original dynamicrange of 0→255 grayscale values, at a 1:1 dynamic range, the block has0→255 grayscale values, at a 2:1 dynamic range, the block has 0→127grayscale values, at a 4:1 dynamic range, the block has 0→63 grayscalevalues, and at an 8:1 dynamic range, the block has 0→31 grayscalevalues.

In some embodiments, the characterization module 115 low-pass filtersthe signal for each block before or during the scaling process toprevent aliasing. For example, in some embodiments, the characterizationmodule 115 employs a 4-tap or 8-tap finite impulse response (FIR) filterwhich effectively performs low-pass filtering using a correspondingnumber of appropriate coefficients prior to decimation. The filteringcauses blurring, which may or may not cause information to be lost,depending on the amount of detail in the block. In some embodiments, thecharacterization module 115 uses a recursive method in which the imagerywithin each block is scaled for each successive spatial scale setting asit was in the previous spatial scale setting.

At each spatial scale setting and dynamic range setting, thecharacterization module 115 calculates the pixel activity level for theblock using a pixel activity metric. In some embodiments, thecharacterization module 115 calculates pixel activity for each blockusing a 2D spatial mean gradient. A 2D spatial mean gradient capturesvertical and horizontal edges. In some embodiments, the characterizationmodule 115 calculates pixel activity of each block using a wavelettransform or other transform to measure an activity parameter for agiven block. Thus, the characterization module 115 measures the amountof information (if any) that is lost at each progressive level ofspatial scaling and at each dynamic range setting.

In some embodiments, the characterization module 115 generates amulti-scale metric for each block that is an N-tuple number such as amatrix representing the N pixel activity levels calculated by thecharacterization module 115 at each progressive level of spatial scalingand at each dynamic range setting. In some embodiments, the multi-scalemetric calculator 135 uses normalized pixel activity level values (e.g.,values that are normalized with respect to the maximum pixel activityvalue), which can be represented by a floating-point number or afixed-point number. In some embodiments, the multi-scale metriccalculator 135 generates a multi-scale metric based on the difference invalues at different spatial scales and/or dynamic ranges.

Thus, for a 16-tuple multi-scale metric representing pixel activitylevels measured for a block at four spatial scale settings and fourdynamic range settings, the multi-scale metric in some embodimentsrepresents the information described below in Table 1.

TABLE 1 1:1 spatial scale 2:1 spatial scale 4:1 spatial scale 8:1spatial scale 1:1 dynamic 1:1 dynamic 1:1 dynamic 1:1 dynamic scalescale scale scale Pixels under Pixels under Pixels under Pixels understudy: 16 × 16 study: 8 × 8 study: 4 × 4 study: 2 × 2 1:1 spatial scale2:1 spatial scale 4:1 spatial scale 8:1 spatial scale 2:1 dynamic 2:1dynamic 2:1 dynamic 2:1 dynamic scale scale scale scale Pixels underPixels under Pixels under Pixels under study: 16 × 16 study: 8 × 8study: 4 × 4 study: 2 × 2 1:1 spatial scale 2:1 spatial scale 4:1spatial scale 8:1 spatial scale 4:1 dynamic 4:1 dynamic 4:1 dynamic 4:1dynamic scale scale scale scale Pixels under Pixels under Pixels underPixels under study: 16 × 16 study: 8 × 8 study: 4 × 4 study: 2 × 2 1:1spatial scale 2:1 spatial scale 4:1 spatial scale 8:1 spatial scale 8:1dynamic 8:1 dynamic 8:1 dynamic 8:1 dynamic scale scale scale scalePixels under Pixels under Pixels under Pixels under study: 16 × 16study: 8 × 8 study: 4 × 4 study: 2 × 2

From left to right, the components of the multi-scale metric as depictedin Table 1 reflect the contribution of details from fine to coarse. Fromtop to bottom, the components of the multi-scale metric as depicted inTable 1 reflect the contribution of details from all contrast levels tohigh contrast only. The contributions of details from fine to coarse andfrom all contrast levels to high contrast only relate directly to thediscrete cosine transform (DCT) and direct sine transform (DST) orwavelet transforms that underlie many video and image compressionalgorithms. Pre-processing module 110 uses the multi-scale metric oranother measure of characteristics of blocks to train a regressive modelor machine learning model to select a filter for pre-processing eachblock based on budgetary or perceptual targets.

Each of the plurality of filters 120 selectively removes informationfrom the blocks to which they are applied. In some embodiments, each ofthe plurality of filters 120 is the same type of filter but is adjustedto different settings than the other filters of the plurality of filters120. In some embodiments, each filter is be a different type of filterthan the other filters. For example, in some embodiments, each of theplurality of filters 120 a bilateral blurring filter having differentcontrol settings. In other embodiments, a first filter of the pluralityof filters 120 is one of a bilateral blurring filter, a temporal filter,a spatio-temporal filter, a motion-compensated filter, or other type offilter, and a second filter of the plurality of filters 120 is adifferent one of a bilateral blurring filter, a temporal filter, aspatio-temporal filter, a motion-compensated filter, or other type offilter. A bilateral blurring filter is a non-linear, edge-preserving,noise-reducing smoothing filter that replaces the intensity of eachpixel with a weighted average of intensity values from nearby pixels. Atemporal filter performs a weighted average of successive frames. Aspatio-temporal filter compresses images and videos by removing spatial,temporal, and visual redundancies. A motion compensation filter predictsa frame in a video, given the previous or future frames by accountingfor motion of the camera or objects in the video.

In some embodiments, each of the plurality of filters 120 is a bilateralblurring filter that applies a Gaussian blur within a given radius. Eachfilter 120 has a control setting adjustable by the filter control 125 toset a threshold value for a difference in values, such as luminance,between a sample pixel (also referred to as a pixel under study)compared to nearby candidate pixels in order for a candidate pixel to beincluded in a blur with the pixel under study. The filter control 125further adjusts a control setting for each filter 120 to set a thresholddistance between the pixel under study and candidate pixels (that alsomeet the difference threshold) for inclusion in the blur. Thus, eachfilter 120 is a bilateral blurring filter having different controlsettings for a difference threshold and distance threshold. For example,in some embodiments, a first filter 120 is a bilateral blurring filterwith a control setting for the threshold difference in values allowingfor a relatively small difference in values between pixels (e.g., lessthan 5%) for inclusion in a blur and a control setting for the distancethreshold between pixels allowing for pixels within a radius of X pixelsfrom a sample pixel (and also meeting the difference threshold) to beincluded in the blur. A second filter 120 in some embodiments is abilateral blurring filter with a control setting for the thresholddifference in values between pixels (e.g., less than 25%) allowing for arelatively larger difference in values between pixels for inclusion inthe blur and a control setting for the threshold distance between pixelsallowing for pixels within a radius of Y pixels from the sample pixel(and also meeting the difference threshold) to be included in the blur.In some embodiments, the filter control 125 applies different controlsettings to the filters 120 to effectively apply N filters to eachblock. In some embodiments, the differences in control settings for eachof the N filters are monotonic, and in other embodiments the differencesin control settings for each of the N filters are not monotonic.

After the pre-processing module 110 applies a first filter of thefilters 120 to a block 105 to generate a filtered block 145, thepre-processing module 110 encodes the filtered block 145, either usingan encoder of the pre-processing module (not shown) or by providing thefiltered block to an encoder 150 external to the pre-processing module110. The encoder 150 encodes (compresses) the filtered block 145 using aspecific quantization parameter to generate an encoded filtered block155. The predictor 130 collects the compressed size (bit usage) of theencoded filtered block 155 that has been filtered using the first filterand encoded using the specific quantization parameter and calculates anerror metric such as mean squared error (MSE) for the encoded filteredblock 155.

In some embodiments, the pre-processing module 110 characterizes,filters, encodes, and measures the sizes and errors of a large number ofblocks (e.g., millions of blocks) at training time using each of the Nfilters 120. Based on the characteristics of each block and thecollected sizes and errors of each block when filtered and encoded witheach of the N filters, the predictor 130 develops a regressive modelusing least squares, random forest regressor, or other machine learningtechniques to output a predicted size of a block when filtered with eachof the N filters 120 and encoded using a given quantization parameter,based on the block's characteristics. The predictor 130 also develops aregressive model using least squares, random forest regressor, or othermachine learning techniques to output a predicted error of a block whenfiltered with each of the N filters 120 and encoded using a givenquantization parameter, based on the block's characteristics. Thepre-processing module 110 trains the size and error models for each ofthe N filters 120 and for each quantization parameter of interest.

After training the models, to pre-process a picture of interest, theprocessing system 100 divides the picture into blocks. Thecharacterization module 115 characterizes each block of the picture, andthe predictor 130 calculates a size model and error model for the blockfor each of the N filters 120, given each block's characterization. Thecost calculator 135 calculates a cost function for each of the N filterseach block based on the predicted sizes and errors of the block, giventhe block's characterization. For example, in some embodiments in whichn is a particular filter 120, multiscale metric coefficients are themulti-scale metric results for the block under study, and QP is thequantization parameter, the cost function is

${Cost} = {( {{size}_{model}( {n,{{multiscale}\mspace{14mu}{metric}\mspace{14mu}{coefficients}},{QP}} )} )^{2}*\sqrt{{error}_{model}( {n,{{multiscale}\mspace{14mu}{metric}\mspace{14mu}{coefficients}},{QP}} )}}$

Based on the cost function for each of the N filters 120 (and, in someembodiments, for each quantization parameter of interest), the filterselector 140 selects the filter having the best (lowest) cost to beapplied to each block, based on the block's characterization. Thepre-processing module 110 repeats the pre-processing for each block ofthe picture, applying the selected filter 120 to each block of thepicture. The pre-processing module 110 provides the filtered picture tothe encoder 150 for encoding.

FIG. 2 illustrates the effects of varying the spatial scales of twosample blocks 205 and 255 for purposes of calculating a multi-scalemetric in accordance with some embodiments. Blocks 205 and 255 areillustrated at a 1:1 spatial scale. Block 205 is a checkerboard patternof four squares, with black squares at the upper left and lower rightquadrants and light gray squares at the upper right and lower leftquadrants. Block 255 is a checkerboard pattern of 256 squares, with 16columns and 16 rows of alternating black and light gray squares.

When the spatial scale for blocks 205 and 255 is adjusted to a 2:1spatial scale, resulting in blocks 210 and 260, respectively, block 210retains the perceptual characteristics of block 205, in that block 210also appears as a checkerboard pattern of four squares, with blacksquares at the upper left and lower right quadrants and light graysquares at the upper right and lower left quadrants. By contrast, at a2:1 reduction in spatial scale, the checker pattern of block 260 is nolonger apparent in block 255.

When the spatial scale for blocks 205 and 255 is further adjusted to a4:1 spatial scale, resulting in blocks 215 and 265, respectively, block215 still retains the perceptual characteristics of block 205, whereasblock 265 appears to be a flat gray square. Similarly, when the spatialscale for blocks 205 and 255 is adjusted to an 8:1 spatial scale,resulting in blocks 220 and 270, the checkerboard pattern can still beseen in block 220, whereas block 270 appears to be a flat gray square,retaining none of the fine detail of block 255.

A multi-scale metric for reflecting the four spatial scale settings(1:1, 2:1, 4:1, and 8:1) shown in FIG. 2 at a single dynamic rangesetting is a 4-tuple. Assuming that the pixel activity is a 2D spatialgradient having a value between 0 and 1, with 0 indicating no verticalor horizontal edges and 1 indicating a maximum amount of vertical andhorizontal edges, the pixel activity value for block 205, which is an8×8 pixel checkerboard pattern, is 0.125, because the pattern has 1/8 ofthe maximum number of transitions for its size. The pixel activity valuefor block 210, which is the 8×8 checkerboard pattern of block 205 scaled2:1, is 0.25, because the pattern has 1/4 of the maximum number oftransitions for its size. The pixel activity value for block 215, whichis the 8×8 checkerboard pattern of block 205 scaled 4:1, is 0.5, becausethe pattern has half of the maximum number of transitions for its size.The pixel activity value for block 220, which is the 8×8 checkerboardpattern of block 205 scaled 8:1, is 1.0, because the pattern has amaximum number of transitions for its size. Thus, the multi-scale metricfor block 205, at the spatial scales illustrated as blocks 205, 210,215, and 220, is represented as [0.125, 0.25, 0.5, 1].

Block 255, by contrast, is a 1×1 pixel checkerboard pattern which has apixel activity value of 1.0, because the pattern has a maximum number oftransitions for its size. Block 260, which has the 1×1 checkerboardpattern of block 255 scaled 2:1, has a pixel activity value of 0,because the low pass filtering of the scaling affects the pattern of theblock 260 to the point that there is no activity in the signal. Blocks265 and 270, which have the 1×1 checkerboard pattern of block 255 scaled4:1 and 8:1, respectively, also have pixel activity values of 0, becausethere is no activity in the signals. Thus, the multiscale metric forblock 255, at the spatial scales illustrated as blocks 255, 260, 265,and 270, is represented as [1, 0, 0, 0]. The multi-scale metric of[0.125, 0.25, 0.5, 1] indicates that the spatial gradient of block 205doubles at each spatial scale and is therefore dominated by coarsedetail that is not diminished by a reduction in spatial scale. Bycontrast, the multi-scale metric of [1, 0, 0, 0] indicates that thegradient of block 255 is affected by a change in spatial scale, andtherefore includes a significant amount of fine detail. Thus, byincorporating measures of 2D spatial gradients or other metrics of pixelactivity at a plurality of spatial scales, the multi-scale metricprovides an indication of the contribution of fine and coarse details toa block.

FIG. 3 illustrates the effects of varying the dynamic ranges of twosample blocks 305 and 355 for purposes of calculating a multi-scalemetric in accordance with some embodiments. Blocks 305 and 355 areillustrated at a 1:1 dynamic range having 256 grayscale values of 0→255.Block 305 is a checkerboard pattern of four squares, with black squaresat the upper left and lower right quadrants and light gray squares atthe upper right and lower left quadrants. Block 355 is a checkerboardpattern of four squares, with black squares at the upper left and lowerright quadrants and dark gray squares at the upper right and lower leftquadrants.

When the dynamic range for blocks 305 and 355 is adjusted to a 2:1dynamic range scale having 128 grayscale values of 0→127, resulting inblocks 310 and 360, respectively, the light gray squares of block 310become relatively lighter, while the black square remain black. Thus,with a 2:1 reduction in dynamic range, the gradient of block 310 ishigher than the gradient of block 305. By contrast, at a 2:1 reductionin dynamic range, the gradient of block 360 is lower than the gradientof block 355, although it is still possible to discern a checkerboardpattern of block 360.

When the dynamic range for blocks 305 and 355 is further adjusted to a4:1 dynamic range scale having 64 grayscale values of 0→63, resulting inblocks 315 and 365, respectively, the gray squares of block 305 havebecome nearly white as shown in block 315, while the block squares haveremained black. At a 4:1 reduction in dynamic range, the gray squares ofblock 355 have become essentially black in block 365, such that thegradient of block 365 approaches zero. Similarly, when the dynamic rangefor blocks 305 and 355 is adjusted to an 8:1 spatial scale having 32grayscale values of 0→31, resulting in blocks 320 and 370, the gradientof block 320 increases to a maximum value, whereas block 370 appears tobe a flat black square of zero gradient.

A multi-scale metric for reflecting the four dynamic range settings(1:1, 2:1, 4:1, and 8:1) shown in FIG. 3 at a single spatial scalesetting is a 4-tuple. Assuming that the pixel activity is a 2D spatialgradient having a value between 0 and 1, with 0 indicating no verticalor horizontal edges and 1 indicating a maximum amount of vertical andhorizontal edges, the multi-scale metric for block 305 (at the spatialscales illustrated as blocks 305, 310, 315, and 320) is represented as[0.8, 0.9, 1.0, 1.0], and the multiscale metric for block 355 (at thespatial scales illustrated as blocks 355, 360, 365, and 370) isrepresented as [0.2, 0.1, 0, 0]. The multi-scale metric of [0.8, 0.9,1.0, 1.0] indicates that the attenuation of high frequencies is morelikely to be noticed for block 305, whereas the multi-scale metric of[0.2, 0.1, 0, 0] indicates that the attenuation of high frequencies isless likely to be noticed for block 355. Thus, by incorporating measuresof 2D spatial gradients or other metrics of pixel activity at aplurality of dynamic ranges, the multi-scale metric provides anindication of the contribution of details from all contrast levelsversus from only high contrast levels.

FIG. 4 illustrates the filter control 125 of the pre-processing module110 applying N bilateral blurring filters 120 having different settingsto a portion of a picture in accordance with some embodiments. Thefilter control 125 includes a distance selector 440 and a differenceselector 445, each of which may be implemented as hard-coded logic,programmable logic, software executed by a processor, or a combinationthereof. By adjusting the distance threshold of the bilateral blurringfilters 120 with the distance selector 440 and the difference thresholdof the bilateral blurring filters 120 with the difference selector 445,the filter control 125 effectively generates a different filter 120 foreach set of distance threshold and difference threshold settings. Otherembodiments employ other types of filters having additional or differentparameters.

After the characterization module 115 has characterized a block, thefilter control 125 applies a plurality of filters 120 to the block. Toillustrate, for a block 425 including pixels 400-424, of which pixel 412is the pixel under study (i.e., the pixel with which other pixels in theblock will potentially be blurred), the filter control 125 applies afirst filter 120 having a first distance threshold and a firstdifference threshold. When applied to the block 225 by thepre-processing module 110, the first filter 120 blurs pixels 406, 407,408, 411, 413, 416, 417, and 418 in the area 435 with the pixel 412under study. When the filter control 125 applies a second filter 120having a second distance threshold and a second difference threshold tothe block 425, the second filter 120 blurs pixels 402, 403, 404, 407,408, 409, 413, and 414 in the area 430 with the pixel 412 under study.

FIG. 5 is a flow diagram illustrating a method 500 for training modelsto predict sizes and distortion (referred to as errors) of blocksfiltered with different filters in accordance with some embodiments. Themethod 500 is implemented in some embodiments of the processing system100 shown in FIG. 1. At block 502, the characterization module 115characterizes each block of a plurality of blocks. In some embodiments,the characterization module 115 characterizes the blocks using amulti-scale metric. In other embodiments, the characterization module115 characterizes the blocks based on other metrics such ascolorfulness, contrast, or noisiness. At block 504, the filter control125 applies a first filter 120 to each block. At block 506, thepre-processing module 110 encodes the filtered blocks using aquantization parameter. At block 508, the predictor 130 collects (orcalculates) a size and error for each encoded filtered block. At block510, the predictor 130 develops a regression model to predict a size ofa block when filtered with the first filter and encoded using thequantization parameter and a regression model to predict an error of ablock when filtered with the first filter and encoded using thequantization parameter. At block 512, the pre-processing module 110determines whether the predictor 130 has developed size and error modelsfor all of the N filters being evaluated.

If, at block 512, the pre-processing module 110 determines that thepredictor 130 has not developed a size and error model for each filterbeing evaluated, the method flow continues to block 514. At block 514,the filter control 125 applies the next filter 120 to each of theplurality of blocks. From there, the method flow continues back to block506. If, at block 512, the pre-processing module 110 determines that thepredictor 130 has developed a size and error model for each of the Nfilters being evaluated, the method flow continues to block 516. Atblock 516, the pre-processing module 110 uses the size and error modelsdeveloped for each of the N filters 120 for pre-processing a picture orvideo, as described further in FIG. 6.

FIG. 6 is a flow diagram illustrating a method 600 for filtering blocksof a picture using a filter having a lowest cost for each block inaccordance with some embodiments. The method 600 is implemented in someembodiments of the processing system 100 shown in FIG. 1. At block 602,the processing system 100 divides the picture into blocks. At block 604,the characterization module 115 characterizes a first block of thepicture using the same characterization method that was used in thetraining step for training the regression models, as discussed inreference to FIG. 5. At block 606, the predictor 130 calculates apredicted size of the first block when filtered using each of the Nfilters 120 by inputting the characterization of the first block intoeach of the N size models (one size model developed for each of Nfilters 120). The predictor 130 also calculates a predicted error of thefirst block when filtered using each of the N filters 120 by inputtingthe characterization of the first block into each of the N error models(one error model developed for each of N filters 120).

At block 608, the cost calculator 135 calculates a cost associated withapplying each of the N filters 120 based on the predicted size andpredicted error of a block having the characteristics of the first blockwhen filtered using each filter 120. In some embodiments, the cost isbased on a cost function that is the product of the square of the sizemodel and the square root of the error model. At block 610, the filterselector 140 selects the filter predicted to incur the lowest cost forthe first block. At block 612, the pre-processing module 110 applies thelowest cost filter to the first block. At block 614, the pre-processingmodule 110 determines whether every block of the picture (or video) ofinterest has been filtered. If, at block 614, the pre-processing module110 determines that every block of the picture has not been filtered,the method flow continues to block 616. At block 616, the pre-processingmodule 110 continues to the next block of the picture, and the methodflow continues back to block 604. If, at block 614, the pre-processingmodule 110 determines that every block of the picture has been filtered,the method flow continues to block 618. At block 618, the pre-processingmodule provides the filtered picture to the encoder 150.

A computer readable storage medium may include any non-transitorystorage medium, or combination of non-transitory storage media,accessible by a computer system during use to provide instructionsand/or data to the computer system. Such storage media can include, butis not limited to, optical media (e.g., compact disc (CD), digitalversatile disc (DVD), Blu-Ray disc), magnetic media (e.g., floppy disc,magnetic tape, or magnetic hard drive), volatile memory (e.g., randomaccess memory (RAM) or cache), non-volatile memory (e.g., read-onlymemory (ROM) or Flash memory), or microelectromechanical systems(MEMS)-based storage media. The computer readable storage medium may beembedded in the computing system (e.g., system RAM or ROM), fixedlyattached to the computing system (e.g., a magnetic hard drive),removably attached to the computing system (e.g., an optical disc orUniversal Serial Bus (USB)-based Flash memory), or coupled to thecomputer system via a wired or wireless network (e.g., networkaccessible storage (NAS)).

In some embodiments, certain aspects of the techniques described abovemay implemented by one or more processors of a processing systemexecuting software. The software includes one or more sets of executableinstructions stored or otherwise tangibly embodied on a non-transitorycomputer readable storage medium. The software can include theinstructions and certain data that, when executed by the one or moreprocessors, manipulate the one or more processors to perform one or moreaspects of the techniques described above. The non-transitory computerreadable storage medium can include, for example, a magnetic or opticaldisk storage device, solid state storage devices such as Flash memory, acache, random access memory (RAM) or other non-volatile memory device ordevices, and the like. The executable instructions stored on thenon-transitory computer readable storage medium may be in source code,assembly language code, object code, or other instruction format that isinterpreted or otherwise executable by one or more processors.

Note that not all of the activities or elements described above in thegeneral description are required, that a portion of a specific activityor device may not be required, and that one or more further activitiesmay be performed, or elements included, in addition to those described.Still further, the order in which activities are listed are notnecessarily the order in which they are performed. Also, the conceptshave been described with reference to specific embodiments. However, oneof ordinary skill in the art appreciates that various modifications andchanges can be made without departing from the scope of the presentdisclosure as set forth in the claims below. Accordingly, thespecification and figures are to be regarded in an illustrative ratherthan a restrictive sense, and all such modifications are intended to beincluded within the scope of the present disclosure.

Benefits, other advantages, and solutions to problems have beendescribed above with regard to specific embodiments. However, thebenefits, advantages, solutions to problems, and any feature(s) that maycause any benefit, advantage, or solution to occur or become morepronounced are not to be construed as a critical, required, or essentialfeature of any or all the claims. Moreover, the particular embodimentsdisclosed above are illustrative only, as the disclosed subject mattermay be modified and practiced in different but equivalent mannersapparent to those skilled in the art having the benefit of the teachingsherein. No limitations are intended to the details of construction ordesign herein shown, other than as described in the claims below. It istherefore evident that the particular embodiments disclosed above may bealtered or modified and all such variations are considered within thescope of the disclosed subject matter. Accordingly, the protectionsought herein is as set forth in the claims below.

What is claimed is:
 1. A method comprising: characterizing each block ofa plurality of blocks of one or more pictures based on metricscomprising at least one of spatial gradient, colorfulness, contrast, ornoisiness; applying a first filter to filter each block of the pluralityof blocks; encoding each of the plurality of blocks filtered with thefirst filter using a first quantization parameter; calculating a sizeand an error of each of the plurality of blocks encoded and filteredwith the first filter; predicting a size and an error of a first blockof a first picture based on a characterization of the first block andthe characterizations and the sizes and errors of each of the pluralityof blocks, wherein the first block is filtered with the first filter andencoded using the first quantization parameter; and applying the firstfilter to the first block in response to a cost of the first filter,based on the predicted size and error, being below a threshold.
 2. Themethod of claim 1, further comprising: applying a second filter tofilter the plurality of blocks of the one or more pictures; encodingeach of the plurality of blocks filtered with the second filter usingthe first quantization parameter; calculating a size and an error ofeach of the plurality of blocks encoded and filtered with the secondfilter; predicting a size and an error of the first block of the firstpicture based on a characterization of the first block and thecharacterizations and the sizes and errors of each of the plurality ofblocks, wherein the first block is filtered with the second filter andencoded using the first quantization parameter; and applying the secondfilter to the first block in response to the cost of the first filter,exceeding a cost of the second filter, based on the predicted size anderror of the first block wherein the first block is filtered with thesecond filter.
 3. The method of claim 2, wherein the first filter andthe second filter are blurring filters wherein applying the first filtercomprises: modulating a first control setting indicating a difference invalues between a sample pixel of the first block and a candidate pixelof the first block for inclusion in a blur comprising the sample pixel;and wherein applying the second filter comprises: modulating a secondcontrol setting indicating a distance between the sample pixel and thecandidate pixel for inclusion in the blur comprising the sample pixel,wherein at least one of the first control setting and the second controlsetting of the first filter differs from the first control setting andthe second control setting of the second filter.
 4. The method of claim2, wherein the first filter comprises one of a bilateral blurringfilter, a temporal filter, a spatio-temporal filter, and amotion-compensated filter, and the second filter comprises one of abilateral blurring filter, a temporal filter, a spatio-temporal filter,and a motion-compensated filter, different from the first filter.
 5. Themethod of claim 2, further comprising: providing the first blockfiltered with the first filter or the second filter to an encoder. 6.The method of claim 5, wherein the cost function for the first filter isbased on the characterizations, the calculated sizes, and the calculatederrors of each of the plurality of blocks wherein the plurality ofblocks are filtered with the first filter and encoded using the firstquantization parameter and the cost function for the second filter isbased on the characterizations, the calculated sizes, and the calculatederrors of each of the plurality of blocks wherein the plurality ofblocks are filtered with the second filter and encoded using the firstquantization parameter.
 7. The method of claim 1, wherein characterizingis based on a multi-scale metric for each of the plurality of blocks,the multi-scale metric comprising coefficients based on estimated levelsof pixel activity of each block for at least one of a plurality ofspatial compression scales and a plurality of dynamic range scales, themultiscale metric indicating how bit allocation or assignment of aquantization parameter is predicted to affect perceptual quality of eachblock.
 8. A method, comprising: calculating a cost of applying a firstfilter and a cost of applying a second filter different from the firstfilter to a first block of a picture, wherein the cost of the firstfilter is based on a characterization based on metrics comprising atleast one of spatial gradient, colorfulness, contrast, or noisiness,first predicted size and first predicted error for the first block,wherein the first block is filtered using the first filter and encodedusing a first quantization parameter, and the cost of the second filteris based on the characterization, second size and second error for thefirst block, wherein the first block is filtered using the second filterand encoded using the first quantization parameter; selecting the firstfilter or the second filter, based on the cost of the first filter andthe cost of the second filter; applying the selected filter to filterthe first block; and providing the filtered first block to an encoderfor encoding the filtered first block.
 9. The method of claim 8, whereinthe first filter and the second filter are blurring filters comprising:a first control setting indicating a difference in values between asample pixel of the first block and a candidate pixel of the first blockfor inclusion in a blur comprising the sample pixel; and a secondcontrol setting indicating a distance between the sample pixel and thecandidate pixel for inclusion in the blur comprising the sample pixel,wherein at least one of the first control setting and the second controlsetting of the first filter differs from the first control setting andthe second control setting of the second filter.
 10. The method of claim8, wherein each of the first filter and the second filter comprises oneof a bilateral blurring filter, a temporal filter, a spatio-temporalfilter, and a motion-compensated filter.
 11. The method of claim 8,further comprising: characterizing the first block of the first picture.12. The method of claim 8, further comprising: predicting the first sizeand the first error for the first block, wherein the first block isfiltered using the first filter and encoded using the first quantizationparameter; and predicting the second size and the second error for thefirst block, wherein the first block is filtered using the second filterand encoded using the first quantization parameter.
 13. The method ofclaim 12, further comprising: training a model to predict the first sizeof the first block and a model to predict the first error of the firstblock based on the first filter and the first quantization parameter;and training a model to predict the second size of the first block and amodel to predict the second error of the first block based on the secondfilter and the first quantization parameter.
 14. The method of claim 8,wherein the characterization is based on a multi-scale metric for thefirst block, the multi-scale metric comprising coefficients based onestimated levels of pixel activity of the first block for at least oneof a plurality of spatial compression scales and a plurality of dynamicrange scales, the multi-scale metric indicating how bit allocation orassignment of a quantization parameter is predicted to affect perceptualquality of the first block.
 15. A device, comprising: a cost calculatorto estimate of applying a first filter and a cost of applying a secondfilter different from the first filter to a first block of a picture,wherein the cost of the first filter is based on a characterization ofthe first block based on metrics comprising at least one of spatialgradient, colorfulness, contrast, or noisiness and a first predictedsize and a first predicted error for the first block, wherein the firstblock is filtered using the first filter and encoded using a firstquantization parameter, and the cost of the second filter is based onthe characterization of the first block and a second size and a seconderror for the first block, wherein the first block is filtered using thesecond filter and encoded using the first quantization parameter; afilter selector to select the first filter or the second filter, basedon the cost of the first filter and the cost of the second filter, andapply the selected filter to filter the first block; and an encoder toencode the filtered first block.
 16. The device of claim 15, wherein thefirst filter and the second filter are blurring filters comprising: afirst control setting indicating a difference in values between a samplepixel of the first block and a candidate pixel of the first block forinclusion in a blur comprising the sample pixel; and a second controlsetting indicating a distance between the sample pixel and the candidatepixel for inclusion in the blur comprising the sample pixel, wherein atleast one of the first control setting and the second control setting ofthe first filter differs from the first control setting and the secondcontrol setting of the second filter.
 17. The device of claim 15,wherein each of the first filter and the second filter comprises one ofa bilateral blurring filter, a temporal filter, a spatio-temporalfilter, and a motion-compensated filter.
 18. The device of claim 15,further comprising: a characterization module to characterize the firstblock of the first picture.
 19. The device of claim 18, wherein thecharacterization module is to characterize the first block based on amulti-scale metric for the first block, the multi-scale metriccomprising coefficients based on estimated levels of pixel activity ofthe first block for at least one of a plurality of spatial compressionscales and a plurality of dynamic range scales, the multi-scale metricindicating how bit allocation or assignment of a quantization parameteris predicted to affect perceptual quality of the first block.
 20. Thedevice of claim 19, further comprising: a predictor configured to: traina model to predict the first size of the first block and a model topredict the first error of the first block based on characterizations ofa plurality of blocks filtered using the first filter and encoded usingthe first quantization parameter; and train a model to predict thesecond size of the first block and a model to predict the second errorof the first block based on characterizations of the plurality of blocksfiltered using the second filter and encoded using the firstquantization parameter.